Introduction

“The simple graph has brought more information to the data analyst’s mind than any other device.”
— John Tukey

  • Data visualization is the creation and study of the visual representation of data.
  • Many tools for visualizing data (R is one of them)
  • Many approaches/systems within R for making data visualizations, ggplot2 is one of them

ggplot2 \(\in\) tidyverse

ggplot2 \(\in\) tidyverse

  • ggplot2: tidyverse’s data visualization package
  • gg in “ggplot2” stands for Grammar of Graphics
  • Inspired by the book Grammar of Graphics by Leland Wilkinson
  • A grammar of graphics is a tool that enables concise description of components of a graphic


ggplot2 \(\in\) tidyverse


ggplot2 \(\in\) tidyverse


Dataset

Stanford Open Policing Project

Police Searches Drop Dramatically in States that Legalized Marijuana

  • Police Stop Data
    • state, driver race, stop rate, marijuana legalization status
stops <- read_csv("./data/opp-search-marijuana_state.csv") %>% 
  filter(state %in% c("WA", "CO")) %>% 
  mutate(legalization_status = ifelse(quarter <= "2013-01-01", "pre","post"),
         search_rate_100 = search_rate * 100) 
## `geom_smooth()` using formula 'y ~ x'

Basic ggplot2 syntax

  • DATA
  • MAPPING
  • GEOM

Your turn!

Exercise: Determine which variable is mapped to which aesthetic (x-axis, y-axis, etc.) element of the dataset.


class: center, middle

Step-by-step


ggplot(data = stops)


ggplot(data = stops, mapping = aes(x = quarter, y = search_rate_100))


ggplot(data = stops, mapping = aes(x = quarter, y = search_rate_100)) +
  geom_point()


ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_point()


ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'


ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth(method = "loess")
## `geom_smooth()` using formula 'y ~ x'


ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth(method = "loess", se = FALSE)
## `geom_smooth()` using formula 'y ~ x'


ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth(method = "loess", se = FALSE) +
  scale_color_viridis_d()
## `geom_smooth()` using formula 'y ~ x'


ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth(method = "loess", se = FALSE) +
  scale_color_viridis_d() +
  theme_minimal()
## `geom_smooth()` using formula 'y ~ x'


ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth(method = "loess", se = FALSE) +
  scale_color_viridis_d() +
  theme_minimal() +
  labs(x = "Year", y = "Search Rate", color = "Driver Race",
       title = "Washington Highway Patrol Searches", subtitle = "Searches Per Hundred stops")
## `geom_smooth()` using formula 'y ~ x'


ggplot, the making of

  1. “Initialize” a plot with ggplot()
  2. Add layers with geom_ functions
ggplot(data = <DATA>) +
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))+
  geom_point(mapping = aes(x = displ, y = hwy))

Mapping

Size data points by a numerical variable

ggplot(data = stops, aes(x = quarter, y = search_rate_100, size = search_rate_100)) +
  geom_point()


Set alpha value

ggplot(data = stops, aes(x = quarter, y = search_rate_100, size = search_rate_100)) +
  geom_point(alpha = 0.5)


Your turn!

Exercise: Using information from https://ggplot2.tidyverse.org/articles/ggplot2-specs.html add color, size, alpha, and shape aesthetics to your graph. Experiment. Do different things happen when you map aesthetics to discrete and continuous variables? What happens when you use more than one aesthetic?

stops %>% ggplot(aes(x = quarter , y = search_rate_100, color = driver_race)) + 
  geom_point() + 
  theme_minimal(base_size = 12)  


Mappings can be at the geom level

ggplot(data = stops) +
  geom_point(mapping = aes(x = quarter, y = search_rate_100))


Different mappings for different geoms

ggplot(data = stops, mapping = aes(x = quarter, y = search_rate_100)) +
  geom_point() +
  geom_smooth(aes(color = driver_race), method = "loess", se = FALSE)
## `geom_smooth()` using formula 'y ~ x'


Set vs. map

  • To map an aesthetic to a variable, place it inside aes()
ggplot(data = stops, 
  mapping = aes(x = quarter, 
                y = search_rate_100,
            color = driver_race)) +
  geom_point() 


  • To set an aesthetic to a value, place it outside aes()
ggplot(data = stops, 
  mapping = aes(x = quarter, 
                y = search_rate_100)) +
  geom_point(color = "red") 

ggplot(data = stops, 
  mapping = aes(x = quarter, 
                y = search_rate_100)) + 
  geom_point(color = "#63B3E8") 


Data can be passed in

stops %>%
  ggplot(aes(x = quarter, y = search_rate_100)) +
    geom_point()


Parameters can be unnamed

ggplot(stops, aes(x = quarter, y = search_rate_100)) +
  geom_point()


Assign ggplot() to objects for layering

p <- ggplot(stops, aes(x = quarter, y = search_rate_100)) +
  geom_point()

p + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'


Common early pitfalls

Mappings that aren’t

ggplot(data = stops) +
  geom_point(aes(x = quarter, y = search_rate_100, color = "blue"))

## Mappings that aren’t
r ggplot(data = stops) + geom_point(aes(x = quarter, y = search_rate_100), color = "blue")

Your turn!

Exercise: What is wrong with the following?

stops %>%
  ggplot(aes(x = quarter, y = search_rate_100, color = legalization_status)) %>%
    geom_point()

+ and %>%

What is wrong with the following?

stops %>%
  ggplot(aes(x = quarter, y = search_rate_100, color = legalization_status)) %>%
    geom_point()
## Error: `mapping` must be created by `aes()`
## Did you use %>% instead of +?

Basic plot

ggplot(data = stops, aes(x = quarter, y = search_rate_100)) +
  geom_point() 


Two layers

ggplot(data = stops, aes(x = quarter, y = search_rate_100)) +
  geom_point()  +
  geom_line()

The power of groups

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() + 
  geom_line()


Now we’ve got it

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_smooth(span = 0.2, se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'


Control data by layer

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point(data = filter(stops, search_rate_100 < .2),
             size = 5, color = "gray") +
  geom_point()


Your turn!

Exercise: Work with your neighbor to sketch what the following plots will look like. No cheating! Do not run the code, just think through the code for the time being.

pre_legalization_high <- stops %>%
  filter((quarter < "2013-01-01" & search_rate_100 > 1.0))
ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point(data = pre_legalization_high, size = 5, color = "gray") +
  geom_point() +
  geom_text(data = pre_legalization_high, aes(y = search_rate_100 + .05, label = search_rate_100), 
            size = 2, color = "black")

ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point()


ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() +
  geom_point(data = pre_legalization_high, size = 5, color = "gray")


ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point(data = pre_legalization_high, size = 5, color = "gray") +
  geom_point()


ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point(data = pre_legalization_high, size = 5, color = "gray") +
  geom_point() +
  geom_text(data = pre_legalization_high, aes(y = search_rate_100, label = search_rate_100), 
            size = 2, color = "black")


ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point(data = pre_legalization_high, size = 5, color = "gray") +
  geom_point() +
  geom_text(data = pre_legalization_high, aes(y = search_rate_100 + .05, label = search_rate_100), 
            size = 2, color = "black")


ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point(data = pre_legalization_high, size = 5, color = "gray") +
  geom_point() + 
  geom_text_repel(data = pre_legalization_high, 
                  aes(x = quarter, y = search_rate_100, 
                      label = as.character(quarter)), 
                  size = 3, color = "black")


ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point(data = pre_legalization_high, size = 5, color = "gray") +
  geom_point() + 
  geom_label_repel(data = pre_legalization_high, 
                  aes(x = quarter, y = search_rate_100, 
                      label = as.character(quarter)), 
                  size = 3, color = "black")


Your turn!

Exercise: How would you fix the following plot?

ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_smooth(color = "blue")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'


Specifying colors

ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  scale_color_manual(values = c("#FF6EB4", "#00BFFF", "#008B8B")) + 
  geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Splitting over facets

ggplot(data = stops, aes(x = quarter, y = search_rate_100)) +
  geom_smooth() +
  facet_wrap( ~ driver_race)


facet_grid

ggplot(data = stops, aes(x = quarter, y = search_rate_100)) +
  geom_line() +
  facet_grid(state ~ driver_race)


facet_grid

ggplot(data = stops, aes(x = quarter, y = search_rate_100)) +
  geom_line() +
  facet_grid(driver_race ~ state)


Scales and legends


Scale transformation

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() +
  scale_y_reverse()


Scale transformation

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() +
  scale_y_sqrt()


Scale details

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() +
  scale_y_continuous(breaks = c(0, 0.25, 0.5, .75, 1.0))

Themes

Overall themes

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() +
  theme_bw()


Overall themes

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() +
  theme_dark() 


Customizing theme elements

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() +
  theme(axis.text.x = element_text(angle = 90))


Combining several plots to a grid

wa_stops <- stops %>% filter(state == "WA") %>% 
  ggplot(aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth(se = FALSE) + 
  labs(title = "Washington")

co_stops <- stops %>% filter(state == "CO") %>% 
  ggplot(aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth(se = FALSE) + 
  labs(title = "Colorado") + 
  theme(legend.position = "none")

Combining several plots to a grid

wa_stops + co_stops
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

(wa_stops / co_stops)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Interactivity

plotly::ggplotly(wa_stops)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Your turn!


Final Exercise:

Recreate this chart

Starter code:

stops %>% filter(state == "WA") %>% 
  ggplot(aes(quarter, search_rate_100, color = driver_race)) +
  geom_point() +
  geom_smooth(method = lm, se = FALSE) 
## `geom_smooth()` using formula 'y ~ x'

  • ‘?labs’ layer controls title, subtitle, caption, etc.

  • ‘?scale_color_manual’ layer allows you to specify your own colors to the levels

  • ‘?geom_vline’ layer draws a vertical line across the plot. (hint: the x-axis is a date data type)

  • ‘?theme’ controls the non-data elements of the plot like size of text, angle of axis ticks, etc.

  • ‘?annotate’ creates a text annotation layer. Same trick with coordinates as geom_vline

  • Experiment with themes

Themes Vignette

To really master themes:

ggplot2.tidyverse.org/articles/extending-ggplot2.html#creating-your-own-theme


class: center, middle

Recap


The basics

  • map variables to aethestics
  • add “geoms” for visual representation layers
  • scales can be independently managed
  • legends are automatically created
  • statistics are sometimes calculated by geoms

ggplot2 template

Make any plot by filling in the parameters of this template

knitr::include_graphics("./img/ggplot2-template.png")


Learn more